Skip to content

Add Windows PTY support for executables and containers (HMP v1 over UDS)#133

Draft
mitchdenny wants to merge 5 commits intomainfrom
mitchdenny/with-terminal-pty
Draft

Add Windows PTY support for executables and containers (HMP v1 over UDS)#133
mitchdenny wants to merge 5 commits intomainfrom
mitchdenny/with-terminal-pty

Conversation

@mitchdenny
Copy link
Copy Markdown
Member

@mitchdenny mitchdenny commented May 4, 2026

Implements PTY support for DCP Executable resources on Windows via ConPTY, exposing the PTY to consumers (Aspire terminal host) over a per-executable Unix domain socket using HMP v1 (the Hex1b Multiplex Protocol).This is the DCP-side slice of aspire/16317 ΓÇö WithTerminal(...), tracked in dcp/6.## Scope| Slice | This PR | Follow-up ||---|---|---|| Windows executables (ConPTY) | Yes | ΓÇö || Linux/macOS executables (creack/pty) | No | Yes || Containers (docker/podman --tty) | No | Yes |The Aspire side (WithTerminal() API + terminal host that consumes HMP v1) is being prepared in parallel and will land against this DCP version.## APIapi/v1/terminal_types.go adds TerminalSpec:gotype TerminalSpec struct { Enabled bool UDSPath string Cols int32 Rows int32}````api/v1/executable_types.go` adds `ExecutableSpec.Terminal *TerminalSpec` (nil = current log-streaming behavior, set = PTY-attached + HMP v1 server).## Implementation- `internal/hmp1/server.go` ΓÇö HMP v1 protocol server: `[type:1B][length:4B LE][payload:N]` framing, frame types Hello/StateSync/Output/Input/Resize/Exit. Bidirectional pumps with graceful teardown via context-cancellation read-deadline nudge.- `internal/exerunners/pty_windows.go` ΓÇö wraps the `UserExistsError/conpty` package as `hmp1.PTY`. Includes Windows argv-quoting for `CreateProcessW`.- `internal/exerunners/pty_other.go` ΓÇö non-Windows stub returning `ErrTerminalNotSupported`.- `internal/exerunners/terminal_session.go` ΓÇö per-executable UDS listener + accept loop. Single-viewer model (next connection bumps the previous). Graceful shutdown on natural process exit waits for in-flight handlers to flush their `Exit` frame before closing the listener.- `internal/exerunners/process_executable_runner.go` ΓÇö branches on `exe.Spec.Terminal.Enabled` in `StartRun` and tears the session down in `StopRun`. Real exit code propagates through `OnRunCompleted`.## Tests- 6 HMP v1 protocol unit tests (`internal/hmp1/server_test.go`) ΓÇö Hello/StateSync emission, output forwarding, input/resize delivery, Exit on PTY exit, oversize-frame rejection.- End-to-end smoke test (`internal/exerunners/terminal_session_windows_test.go`, build-tagged `windows`) ΓÇö spawns `cmd.exe /c "echo hello-world-from-conpty && exit 0"` under ConPTY at 100x30, dials the UDS as an HMP v1 client, asserts the full `Hello -> StateSync -> Output -> Exit(code=0)` sequence with the literal echoed text in the captured Output stream.## Validation# Windowsgo build ./...go test ./internal/hmp1 ./internal/exerunners -count 3```(Run with -count 3 deliberately ΓÇö the HMP v1 server has a few timing-sensitive paths (deadline nudges, graceful shutdown) that single-pass runs would not reliably catch.)## Not in this PR- Containers (`Container.Spec.Terminal`).- Linux/macOS executables.- Aspire-side `WithTerminal()` plumbing.- DCP version bump in Aspire.

Companion PR

Aspire side: microsoft/aspire#16760 — WithTerminal(): per-replica interactive terminal sessions (Aspire side, draft)

Phase 3 of Aspire WithTerminal end-to-end. Adds DCP-side terminal support for Executable resources on Windows via ConPTY. The PTY-attached process is reachable from the Aspire terminal host through a Hex1b Multiplex Protocol v1 server on a per-replica Unix domain socket.

API:

  - api/v1/terminal_types.go: TerminalSpec (Enabled, UDSPath, Cols, Rows)

  - api/v1/executable_types.go: ExecutableSpec.Terminal *TerminalSpec

Implementation:

  - internal/hmp1/: HMP v1 server (Hello/StateSync/Output/Input/Resize/Exit)

  - internal/exerunners/pty_windows.go: ConPTY wrapper via UserExistsError/conpty

  - internal/exerunners/pty_other.go: non-Windows stub returning ErrTerminalNotSupported

  - internal/exerunners/terminal_session.go: per-executable UDS listener + accept loop, single-viewer model with graceful shutdown that flushes the Exit frame

  - internal/exerunners/process_executable_runner.go: branches on Terminal.Enabled in StartRun and tears the session down in StopRun

Tests:

  - internal/hmp1/server_test.go: 6 unit tests covering Hello/StateSync, output forwarding, input/resize delivery, Exit on PTY exit, oversize-frame rejection

  - internal/exerunners/terminal_session_windows_test.go: end-to-end smoke test that spawns cmd.exe under ConPTY, dials the UDS as an HMP v1 client, and asserts the full Hello -> StateSync -> Output -> Exit frame sequence with the literal echoed text and exit code 0
Refactors the Windows PTY/HMP v1 plumbing introduced in PR #133 from
internal/exerunners into a shared internal/termpty package, then layers
container terminal support on top using the same primitives.

Refactor (no behavior change for executables):
- New internal/termpty package with Process, Session, CommandSpec,
  SessionConfig and BuildWindowsCommandLine.
- internal/exerunners now consumes termpty.StartProcess and
  termpty.StartSession; the executable terminal session windows test
  moves into termpty/session_windows_test.go.

Container PTY:
- ContainerSpec gains a Terminal *TerminalSpec field (mirrors the
  shape on ExecutableSpec).
- Docker and Podman applyCreateContainerOptions append "-t -i" to
  `docker create` / `podman create` when Terminal is enabled, so the
  container runtime allocates a real PTY for the container's PID 1.
- New controllers/container_terminal.go::startContainerTerminalSession
  spawns the orchestrator's CLI as `<runtime> start --attach
  --interactive <id>` under a host ConPTY, exposing the resulting
  byte stream as an HMP v1 producer at TerminalSpec.UDSPath.
- runningContainerData tracks the live *termpty.Session; the new
  closeTerminalSession helper is invoked from deleteContainer so the
  attach process is reaped during teardown.
- ContainerReconciler.ensureContainerTerminalSession invokes the
  helper after a successful start (both the simple-network and
  custom-network code paths). Terminal attach errors are non-fatal:
  the container keeps running, just without an attachable terminal.

The terminal attach process exiting (because the container exited or
was removed) signals container exit through the same Session.Done()
channel that the executable runner already uses.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mitchdenny
Copy link
Copy Markdown
Member Author

Container support has been split out and stacked on top of this PR: #138 (DCP side) + microsoft/aspire#16762 (Aspire side).

docker container start <id> is documented to start STOPPED containers; on
a running container, '--attach --interactive' is a no-op and the host PTY
never wires up to PID 1's stdin/stdout. Because the container reconciler
already starts the container detached (via StartContainers) before
ensureContainerTerminalSession runs, the second 'start' call did nothing
and the host saw no terminal traffic.

Switch to 'attach' on the running container with:

  --sig-proxy=false  - keep host-process Ctrl-C from killing the
                       container; signals are delivered in-band via
                       HMP v1 input bytes (e.g. 0x03).
  --detach-keys=     - disable the default Ctrl-P,Ctrl-Q detach so
                       those keystrokes pass through as raw bytes.

Container creation already opens TTY+stdin via -t -i (in
applyCreateContainerOptions), so 'attach' delivers a usable
bidirectional PTY stream.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mitchdenny mitchdenny changed the title Add Windows PTY support for executables (HMP v1 over UDS) Add Windows PTY support for executables and containers (HMP v1 over UDS) May 5, 2026
mitchdenny and others added 2 commits May 5, 2026 16:05
Previously DCP listened on the producer UDS and the Aspire terminal host
dialed in. That ordering meant the terminal host could only attach AFTER
DCP had already started the PTY-attached process AND opened its listener.
Anything the underlying shell/process printed in that window (notably
bash's first PS1 prompt) was lost, because docker attach (and a fresh PTY
read pump in general) does not replay buffered TTY output.

This commit reverses just the TCP role on the producer link:

* Terminal host listens on the producer UDS (it owns the file now).
* DCP dials in via dialUDSWithRetry (50 x 100ms) which absorbs the small
  window between the host process being marked running and its listener
  actually being bound. The resource-model WaitAnnotation already
  sequences the host before any resource that uses it.
* The HMP v1 protocol roles are unchanged: DCP still serves the protocol
  (it owns the PTY data source); the host still consumes it. Only the
  underlying TCP direction flipped.

Session lifecycle simplifications that fall out of the flip:

* No listener field, no acceptLoop, no per-session multi-viewer slot.
  Exactly one connection per session.
* DCP no longer owns the UDS file; os.Remove calls in StartSession,
  watchExit, and Close are gone.
* The connection is established eagerly in StartSession, so a dial
  failure surfaces immediately to the caller.

Why this fixes the missed-prompt symptom: with the new direction the
terminal host's listener exists before DCP attempts to dial. Once DCP
dials, hmp1.Serve immediately starts pumping PTY output. The terminal
host's headless presentation captures that output into its own
scrollback, and any dashboard/CLI viewer that connects later gets a
StateSync replay from the host - even if their attach happens many
seconds after the shell's first prompt.

The Windows test now plays the listening side: net.Listen first, then
StartSession (which dials), then Accept.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously Session.Close() and watchExit() both raced to close the same
ConPTY handles (PseudoConsole, pi.Process, pi.Thread, pty I/O pipes)
while cp.Wait() was still polling WaitForSingleObject(pi.Process) in a
1-second loop. The double-close pattern is dangerous on Windows: when a
HANDLE value is closed and the kernel recycles it for an unrelated
object in the same process, a second CloseHandle on the original
numeric value closes the unrelated object.

If the recycled handle happens to be DCP's process cleanup job
(JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE in pkg/process/os_executor_windows.go),
Windows immediately terminates every process assigned to that job. In
the Aspire WithTerminal scenario this means clicking Stop on a single
PTY-attached resource cascade-kills the dashboard, every other
terminal-host replica, and every sibling executable/container managed
by the same DCP — while DCP itself stays alive because its own handle
table is fine.

Refactor watchExit() to be the *sole* owner of session teardown:

* The PTY wait is isolated to a private inner goroutine that is the
  only code that ever observes pi.Process. It exits when the process
  exits naturally OR when watchExit closes the PTY.
* watchExit selects on (waitDone, stopCh): natural exit takes the
  first branch; explicit Close() takes the second by closing stopCh.
* In both branches watchExit calls PTY.Close exactly once, drains the
  inner wait goroutine, gives the in-flight HMP v1 handler a bounded
  window to flush its Exit frame, and finally closes the connection.
* Close() no longer touches any conpty handles or the connection. It
  just signals teardown via stopCh (guarded by sync.Once) and blocks
  on doneCh with a 5s safety timeout.

This eliminates the double-close pattern entirely. PTY handles are
closed exactly once and only after the wait goroutine that owns them
has been drained.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mitchdenny
Copy link
Copy Markdown
Member Author

Fix: cascade kill on Stop of WithTerminal-attached resource

Pushed 38f88c9 to address a Windows-specific bug discovered during end-to-end testing of WithTerminal in the Aspire repo.

Symptom

Clicking Stop in the Aspire dashboard on a resource started under .WithTerminal(...) would kill not just that resource, but every other DCP-managed child in the same run: the Aspire dashboard itself, all �spire.terminalhost replicas, and every sibling executable/container — leaving only DCP, the AppHost, and the CLI alive. Stopping a non-PTY resource never reproduced this. A 10-second heartbeat in the dashboard process would stop emitting between consecutive ticks (i.e. process death between heartbeats), with no managed exception, no ProcessExit, no UnobservedTaskException.

Root cause

\Session.Close()\ and \watchExit()\ both raced to close the same conpty handles (\PseudoConsole, \pi.Process, \pi.Thread, the four PTY pipe handles) while \cp.Wait()\ was concurrently polling \WaitForSingleObject(pi.Process)\ in a 1-second loop (\internal/termpty/session.go:185-212\ and \pkg/mod/github.com/UserExistsError/conpty@v0.1.4/conpty.go:252-261).

The double-close pattern is dangerous on Windows. When a HANDLE value is closed and the kernel recycles it for an unrelated kernel object in the same process, a second \CloseHandle\ on the original numeric value closes the unrelated object. If that recycled handle happens to be DCP's process cleanup job handle (created in \pkg/process/os_executor_windows.go:209-238\ with \JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE), the kernel immediately terminates every process assigned to that job — which is exactly what we saw.

Fix

Refactor \watchExit()\ to be the sole owner of session teardown:

  • The PTY wait is isolated to a private inner goroutine that is the only code that ever observes \pi.Process.
  • \watchExit\ selects on (waitDone, stopCh): natural exit takes the first branch; explicit \Close()\ takes the second by closing \stopCh.
  • Both branches converge on a single \PTY.Close\ call, then drain the inner wait goroutine, then give the in-flight HMP v1 handler a bounded window to flush its Exit frame, then close the connection.
  • \Close()\ no longer touches any conpty handles or the connection. It just signals teardown via \stopCh\ (guarded by \sync.Once) and blocks on \doneCh\ with a 5s safety timeout.

Tests: \go test ./internal/termpty/... ./internal/exerunners/... ./internal/hmp1/... ./controllers/...\ all green; \go vet ./...\ clean. Will validate against the live Aspire repro next.

Copy link
Copy Markdown
Collaborator

@karolz-ms karolz-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes for myself to check/clean things up

Comment thread api/v1/container_types.go
// capture. Terminal mode requires Windows on the host (ConPTY) for the
// initial slice; non-Windows hosts return ErrTerminalNotSupported at
// startup.
Terminal *TerminalSpec `json:"terminal,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be included in the validation (for both initial object creation and object update) and in the lifecycle key calculation.

Comment thread api/v1/terminal_types.go
//
// When Enabled is true, DCP allocates a PTY for the underlying process and
// listens on UDSPath (a Unix Domain Socket path on Linux/macOS, or a named pipe
// path on Windows in a follow-up). When the terminal host opens an HMP v1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not Unix domain sockets on Windows? We use them for pumping DCP logs into Aspire dashboard already...

Comment thread api/v1/terminal_types.go
// - Process exit -> HMP v1 Exit frame, then close
//
// The HMP v1 wire format is defined by the Aspire dashboard's terminal host
// (see Hex1b's Hmp1Protocol). DCP's responsibility is limited to PTY
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a URL to the HMP spec here.

Comment thread api/v1/terminal_types.go
type TerminalSpec struct {
// Enabled controls whether DCP allocates a PTY for the process and exposes
// an HMP v1 producer endpoint at UDSPath.
Enabled bool `json:"enabled,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this flag? What does it mean to have TerminalSpec present in an Executable or Controller spec (*Terminal is not nil) but Enabled is false? Is this a valid spec and if so, what is the semantics?

Comment thread api/v1/terminal_types.go

// UDSPath is the Unix Domain Socket path that DCP listens on for the
// terminal host's HMP v1 client connection. Required when Enabled is true.
UDSPath string `json:"udsPath,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to support changing UDSPath for existing Container or Executable? If yes, we should handle it in the controller(s). If not we should prevent that change via validation.

// uses. Empty strings, strings containing whitespace, or strings containing
// quotes get wrapped in double quotes; backslashes preceding a quote are
// doubled.
func quoteWindowsArg(arg string) string {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is Windows API to do this

Comment thread internal/termpty/pty.go
// CommandLine is the full command line in CreateProcessW form (a single
// string with arguments embedded and quoted as appropriate). Callers can
// build this via BuildWindowsCommandLine.
CommandLine string
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not... should prefer arg array

Comment thread internal/termpty/pty.go

// Env is the environment block passed to the child process (each entry
// is "KEY=VALUE"). May be nil to inherit the parent's environment.
Env []string
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map

every = 100 * time.Millisecond
)
var lastErr error
for i := 0; i < attempts; i++ {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have reliability package for retries


tp *Process

mu sync.Mutex
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of synchronization primitives used here. Need to dig into this component and simplify it.

@karolz-ms karolz-ms self-assigned this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants